Communications Chemistry — Latest Matching Preprints

1

Function-guided design of active enzymes

Hu, M.; Wu, L.; Yang, Y.; Li, F.; Zhu, L.

2026-06-29 bioinformatics 10.64898/2026.06.27.735025 medRxiv

Top 0.1%

11.0%

Show abstract

Designing enzymes from functional descriptions remains challenging because catalytic activity is governed by sequence-structure-function relationships. Here we present EnzymeArt, a function-conditioned enzyme-design framework centred on a generative sequence model. EnzymeArt couples function-conditioned sequence generation with structure-guided refinement, annotation checks and substrate-aware computational prioritization to select candidates for synthesis and biochemical testing. Across alcohol dehydrogenase (ADH), malate dehydrogenase (MDH) and triacylglycerol lipase design campaigns, 57 of 60 synthesized designs showed crude-lysate activity above matched background controls. Purified representatives further showed quantitative steady-state catalytic activity. The best designed ADH reached kcat = 223.7/s and exceeded a wild-type reference under matched conditions, an MDH reached kcat = 267.57/s despite having only 33% sequence identity to its closest BLASTP hit, and a designed lipase hydrolysed both short- and long-chain triglycerides with apparent activity modestly above that of a commercial lipase reference. Together, these results establish a route for converting functional descriptions into experimentally validated enzyme designs with quantitative steady-state kinetic activity.

2

Minimal Data, Maximal Insight (MDMI): A Structure-guided Pipeline for Discovering Functional Alternatives in Peptide-Protein Interfaces

Bayat, P.; Perkins, S. J.; Clancy, S.; Patel, S. S.; Yin, R. F.; Bozovicar, K.; Singh, S.; Shrestha, S.; Moustafa, Z.; Zayani, R.; IWE, I.; Bayat, S.; Kelly, P.; Vigar, J. R. J.; White, V. Y.; Xie, M.; Simchi, M.; Palter, S.; Nguyen, J.; Zeisler, I. Y.; Wu, B.; Pardee, K.

2026-07-14 synthetic biology 10.64898/2026.07.13.737974 medRxiv

Top 0.1%

8.5%

Show abstract

Discovering functional peptides across vast sequence space remains a formidable challenge, particularly when experimental training data is scarce. We present Minimal Data Maximal Insight (MDMI), a two-stage structure-guided computational pipeline that designs functional peptide variants using only a small, annotated dataset. Rather than relying on sequence information alone, MDMI integrates three-dimensional structural features derived from predicted peptide-protein complexes into a machine learning model that captures interface geometry and binding energetics. This structure-aware predictor, paired with a genetic algorithm for sequence exploration, reduced false positives from 70% to close to zero in an all-negative benchmark panel compared with a sequence-only model in computational benchmarking, and produced approximately four-fold more high-confidence in silico binders than state-of-the-art peptide/protein design baselines. Using the split-GFP system as a testbed, where fluorescence provides a direct functional readout of peptide-protein complementation, MDMI identified peptides with up to 38% sequence divergence from wild-type in Stage 1 while retaining measurable activity. In Stage 2, motif-guided recombination of successful Stage 1 variants produced highly divergent yet functional peptides bearing over 50% sequence difference from wild-type, revealing two distinct functional clusters in sequence space. As further validation, a top-performing candidate expressed as a full-length GFP fusion retained a GFP-like emission profile, supporting formation of a fluorescent GFP-like scaffold. These results demonstrate that structure-informed pipelines can uncover remote functional sequence space from minimal data, with broad implications for peptide and therapeutic analog discovery.

3

From biofilms to birth: Quantitative murburn rationale for hydrated polymer-centred biological transduction, coherence, and evolution of complex life

Manoj, K. M.; Jaeken, L.; Tamagawa, H.; Burra, V. L. S. P.

2026-07-13 biochemistry 10.64898/2026.07.10.737745 medRxiv

Top 0.1%

8.1%

Show abstract

Hydrated extracellular polymeric phases (such as mucus, biofilms, and extracellular matrices) have traditionally been viewed as passive barriers. We complement and extend this view by analysing these systems through the murburn framework and liquid-liquid phase separation (LLPS) biophysics. Using quantitative modeling, we first demonstrate how frothy mucus in amphibian egg-masses enhances oxygen delivery while buffering diffusible reactive species (DRS), leading to improved developmental synchrony. We then model the human cervical mucus system, showing its cycle-dependent transitions between coherent barriers (pregnancy), active transduction media (ovulation), and controlled inflammatory remodeling (labor). Finally, we present thiolated polyglycerol sulfate (dPGS-SH) as a synthetic validation (another groups recently published work): this rationally designed mucolytic agent recapitulates native mucuss DRS-modulating properties and shows superior efficacy for addressing cystic fibrosis pathology. With such pan-systemic perspectives, we argue that phase-separated hydrated polymeric matrices represent one of evolutions most conserved solutions for regulating stochastic murburn chemistry, enabling organisms to exploit oxygen while preserving biological coherence. From biofilms to birth, this framework unifies the physicochemical basis of lifes most fundamental processes.

4

Optical Screening Identifies Chemical Modulators of Intracellular α-synuclein Aggregation

Rothschild, L.; Giem, C.; Bajaj, A.; Luo, J. W.; Carey, K. L.; Deguine, J.; Xavier, R. J.

2026-07-10 cell biology 10.64898/2026.07.04.736150 medRxiv

Top 0.1%

8.1%

Show abstract

Parkinsons disease (PD) is a movement disorder characterized by the accumulation of alpha-synuclein aggregates leading to dopaminergic neuron loss in the substantia nigra. While PD has been associated with environmental and microbiome changes, our ability to assess the mechanistic impact of these factors on synuclein aggregation in cells has remained limited. Here, we designed and optimized a high-throughput optical screening system to assess the effect of metabolites and small molecules on synuclein aggregation in cell lines expressing a synuclein-fluorescent protein fusion and treated with pre-formed fibrils (PFFs). Using this assay, we identified several compounds that modulate synuclein aggregate accumulation in cells, including harman, a {beta}-carboline that led to reduced synuclein aggregation. We further investigated the transcriptional effect of harman and PFFs and identified changes in peroxiredoxins as a potential mechanism linking harman to aggregate accumulation. Altogether, this work establishes a pipeline to prioritize small molecules that can impact synuclein aggregate formation.

5

A covalent irreversible inhibitor binds in two mutually exclusive conformations to the active-site cysteine residue of human aldehyde dehydrogenase 1A3

Covaleda, D.; Vizarraga, D.; Upadhyay, T.; Zhu, J.; Abegg, D.; Pequerul, R.; Hugo, M.; Adibekian, A.; Fita, I.; Pares, X.; Aviles, F. X.; Boggyo, M.; Farres, J.

2026-07-15 biochemistry 10.64898/2026.07.14.738401 medRxiv

Top 0.1%

7.9%

Show abstract

Aldehyde dehydrogenases (ALDH) are enzymes that catalyze the NAD(P)+-dependent oxidation of aldehydes into carboxylic acids, playing roles in detoxification, biosynthesis, and regulatory functions. Dysfunction of ALDH is associated with serious conditions such as alcohol intolerance, cancer, cardiovascular problems, and neurological disorders. In humans, ALDH1A1 and ALDH1A3 isoforms act as retinaldehyde dehydrogenases and are overexpressed in various cancers, where high levels are associated with increased tumor malignancy, cancer stem cell traits, and therapeutic resistance. ALDH1A3 is recognized as a promising target for anticancer therapies, with several inhibitors, mainly reversible, developed to specifically target it or the enzyme family. Since ALDH enzymes can also display esterase activity, we used this property to develop an in vitro assay specifically targeting the esterase function of ALDH1A3. A highly conserved active-site cysteine in ALDH1A3 is located at the bottom of two converging channels, which define the substrate- and cofactor-binding pockets. To target this catalytic cysteine, we screened a library of 3,200 cysteine-focused covalent fragments. This led to the identification of Z3405279217 (Z34), an acrylamide-based covalent compound that inhibits both ALDH1A1 and ALDH1A3 at sub-micromolar levels. Biochemical and biophysical tests confirmed that Z34 acts as a time-dependent, covalent, and irreversible binder to the active-site cysteine. In this work, we determined the Cryo-EM structure of the ALDH1A3-Z34 complex at 2.26 [A] resolution, confirming the covalent attachment to the catalytic cysteine of Z34. Notably, two mutually exclusive covalent binding modes were observed: one occupying the substrate-binding pocket and the other the cofactor-binding region. Z34 displayed unexpected binding modes within the active site and holds promise as a lead compound for future drug development. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=184 HEIGHT=200 SRC="FIGDIR/small/738401v1_ufig1.gif" ALT="Figure 1"> View larger version (33K): org.highwire.dtl.DTLVardef@982d1forg.highwire.dtl.DTLVardef@ba86f2org.highwire.dtl.DTLVardef@1f19f2borg.highwire.dtl.DTLVardef@8e807_HPS_FORMAT_FIGEXP M_FIG C_FIG

6

MAERM: Predicting Enzyme-Reaction Matching Relationships with a Mixed-Attention Model

Liu, T.; Zhai, S.; Lin, S.; Zhan, X.; Deng, J.; Liu, H.; Siu, S. W. I.

2026-07-10 bioinformatics 10.64898/2026.07.06.736902 medRxiv

Top 0.1%

7.8%

Show abstract

Harnessing enzyme specificity requires a thorough understanding of enzyme promiscuity, which determines enzymes catalytic scope; however, measuring this scope still relies heavily on labor-intensive analytical approaches. While data-driven approaches have emerged to predict the catalytic scope of enzymes, these methods continue to face challenges such as restricted datasets and insufficient integration of enzyme structural information and reaction transformations. Here, we introduce MAERM, an innovative mixed-attention model designed to predict enzyme-reaction matching relationships. Built on our MAERM-DB, a dataset with broad coverage of validated and chemoenzymatic catalysis data, MAERM utilizes a local-global attention module to integrate multimodal enzyme information with fine-grained reaction representations, thereby predicting enzyme-reaction matching probabilities. Results show that MAERM consistently outperforms all baselines, with an average F1-score of 0.984. Notably, on challenging test samples with less than 40% sequence identity to the training set, MAERM outperforms the second-ranked model by 5.9% in F1-score. In addition, MAERM achieves the highest top-10 success rate of 51.7% on Enzyme-405 and the highest balanced accuracy of 0.697 on BioCat-547, further supporting its generalizability in enzyme screening and chemoenzymatic catalysis. Finally, MAERM can serve as an efficient scoring module. When integrated with ProteinMPNN, MAERM has successfully guided novel enzyme design for two carbonyl reduction reactions, resulting in enhanced catalytic potential for the native substrate and demonstrating broad compatibility. Overall, MAERM has the potential to reduce the experimental cost of measuring enzymes catalytic scope, facilitate enzyme design, and ultimately accelerate the design-build-test-learn cycle in enzyme engineering.

7

Fragment Based Active Site Exploration of Urethane Hydrolases Reveals a Diversity of Urethane Binding Modes

Bicer, D.; Kochubei, D.; Graham, R.; Pena-Diaz, S.; Rotilio, L.; Villadsen, N. L.; Sommerfeldt, A.; Johansen, M. B.; Sandahl, A.; Thirup, S. S.; Morth, J. P.; Otzen, D. E.

2026-07-07 biochemistry 10.64898/2026.07.06.734427 medRxiv

Top 0.1%

7.2%

Show abstract

Recent advances in the discovery, characterisation, and engineering of urethanases provide new opportunities for the sustainable biocatalytic degradation of polyurethane waste. A mechanistic understanding of enzyme-plastic interactions is essential for structure-based engineering to enhance urethanase activity. However, the extremely complex and hydrophobic nature of polyurethane makes it challenging to elucidate the structural basis of enzyme-plastic interactions. Here, we used a fragment-based approach to characterise the active sites of two novel urethanases with different catalytic scaffolds, employing both a crystallographic fragment-screening (FASE) campaign and soluble fragments of plastic-like analogues that mimic the substrate, transition state, or product. FASE identified new substrate-binding subpockets while interactions of plastic mimetics in the active site provided a mechanistic understanding of the recognition and binding of polyurethane fragments by these subpockets. These results highlight a diversity of binding modes among urethanases toward different polyurethane fragments.

8

A control-validated pan-proteome deep-learning pipeline nominates GPR35 as a candidate target of the orphan bacterial metabolite ligiamycin A

Martin, J.

2026-07-06 bioinformatics 10.64898/2026.07.01.735807 medRxiv

Top 0.1%

7.0%

Show abstract

Most microbial natural products with documented bioactivity lack an identified molecular target, which limits their development. We present an open, control-validated computational pipeline for natural-product target hypothesis generation. It combines a pan-proteome deep-learning drug-target interaction (DTI) model (a graph neural-network ligand encoder, an ESM-2 protein language-model encoder, and bidirectional cross-attention) with bias-corrected ranking and control-anchored molecular docking. Applying it to ligiamycin A, a 2022-described Streptomyces/Achromobacter co-culture decalin-amino-maleimide with no reported target, we find that the predicted interactions of the compound are dominated by class-A G-protein-coupled receptors. Using a drug with a known target (losartan) we identify and correct a frequent-hitter bias in the raw model; after correction the standout candidates are uniformly class-A GPCRs, led by the orphan receptor GPR35. Structure-based docking with matched positive and negative controls across three candidates corroborates GPR35 specifically: ligiamycin A scores comparably to the known GPR35 agonist zaprinast at the agonist pocket (-8.1 vs -8.3 kcal/mol; non-binder floor -5.5), whereas FFAR1 is excluded and histamine H2 is inconclusive. We propose GPR35 as a prioritized, experimentally testable target and release the workflow as a reusable tool. The result is a computational hypothesis that requires experimental validation.

9

Retention, not flux: endpoint confounding caps computational prediction of peptide skin penetration, with a delivery-aware reframing

Komianos, N.; Prakash, P.

2026-06-29 bioinformatics 10.64898/2026.06.25.734657 medRxiv

Top 0.1%

6.9%

Show abstract

Bioactive peptides are now central to cosmetic and dermatological actives, yet predicting whether a given sequence will reach its site of action in skin remains unsolved. We contend that the dominant framing, predicting a single binary "skin permeability" label from sequence, is ill-posed, and that this, rather than a shortage of modelling power, explains the field's stalled predictive performance. The scope of the claim is narrow: barrier-crossing propensity is a legitimate, learnable function of molecular structure, whereas the vehicle- and endpoint-agnostic binary label that the literature supplies is not. We support this with a first-principles analysis and a study of public-source data. First, the experimental endpoint most commonly reported, transdermal flux into a diffusion-cell receptor compartment (OECD Test Guideline 428), conflates two opposite outcomes (genuine deep delivery and undesired systemic transport) and is, for a cosmetic active, frequently a failure signal rather than a success signal. That receptor flux is an imperfect measure of cutaneous bioavailability is long established in dermatopharmacokinetics; our contribution is to show that the same confound, inherited through scraped labels, is what caps machine learning from sequence. Second, reported "permeability" is a property of the sequence x delivery-vehicle x measurement-compartment triad, two terms of which are usually unrecorded. Third, on public-source data, a physicochemical intrinsic-permeability estimate (Potts-Guy) carries no positive predictive signal for scraped penetration labels (grouped AUC 0.45, 95% CI 0.40-0.51); sequence-only classifiers plateau in the mid-0.70s with diminishing returns as labels accumulate (AUC 0.70-0.77); and the same descriptor pipeline on a clean single-endpoint membrane dataset scores materially higher (AUC 0.83, non-overlapping CI). Our proposed reframing separates barrier-crossing (data-driven, sequence-level) from depth-and-retention (physics-driven, delivery-aware) and treats intrinsic transdermal flux as a regulatory risk axis; we close by proposing a triad-annotated reporting schema and a seed benchmark.

10

EZSolver: Template-free prediction of polar enzymatic mechanisms via bidirectional flow matching and search

Kuo, L.-H.; Yang, J.; Arnold, F.

2026-07-09 bioinformatics 10.64898/2026.07.08.737313 medRxiv

Top 0.1%

6.7%

Show abstract

Predicting enzymatic reaction mechanisms is critical for understanding enzyme function and for designing and dis-covering new enzymes. Current computational predictors rely on deterministic, rule-based dictionaries, which per-form well on in-distribution tasks but fail to generalize to out-of-distribution (OOD) chemistry. To address this limita-tion, we present EZSolver, a template-free, generative framework for polar enzymatic mechanism prediction. Powered by a flow matching predictor (EZFlow) and navigated by an evaluator-guided bidirectional beam search, EZSolver learns the chemistry of electron redistribution instead of memorizing rigid templates. Evaluated across diverse en-zyme classes, EZSolver achieves a 60.0% accuracy and an 84.6% chemical plausibility rate for full mechanism predic-tion of unseen polar enzymatic reactions. While rule-based models collapse without predefined templates, EZSolver successfully extrapolates chemical knowledge to infer uncatalogued pathways, as demonstrated during rigorous OOD benchmarking. By illuminating enzymatic chemical mechanisms, EZSolver helps pave the way for automated predic-tion of enzyme function and discovery and design of novel biocatalysts for sustainable chemistry.

11

ADMET Property Prediction with Quantum-Inspired Preprocessing

Mansour, B.; Rafaelyan, G.

2026-07-05 bioinformatics 10.64898/2026.06.30.735582 medRxiv

Top 0.1%

6.7%

Show abstract

Accurate prediction of Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) properties is a central challenge in early-stage drug discovery, where experimental determination remains costly and time-consuming. In this work, we propose a quantum-inspired preprocessing framework in which statistical dependencies among molecular descriptors are encoded into a parameterised many-body Hamiltonian, and the expectation values obtained by simulating its time evolution serve as additional inputs to a gradient-boosted ensemble model (CatBoost). Mutual information (MI) is used both to select the most informative descriptors and to set the coupling strengths of the Hamiltonian, so that the induced entanglement structure reflects empirically measured feature correlations; the evolution is realised with a short digitised-counterdiabatic schedule that generates a compact set of expectation-value features while keeping the circuit shallow. The resulting quantum-derived feature vectors are concatenated with the full MapLight descriptor set, concatenated ECFP, Avalon, and ErG fingerprints together with RDKit physicochemical properties, before training. We evaluate the pipeline on the AqSolDB aqueous solubility benchmark from the Therapeutics Data Commons (TDC) platform, achieving a mean absolute error (MAE) of 0.746 +/- 0.006 log(mol/L), which is within the reported error bars of the current top-performing model on the TDC leaderboard (MAE = 0.741 +/- 0.013). Ablation experiments show that the quantum-derived features match classical second-degree polynomial interaction features derived from the same MI-selected subset, while forming a far more compact representation (85 quantum features versus up to 4,950 polynomial terms, an approximately 58-fold reduction). SHapley Additive exPlanations (SHAP) analysis identifies the physicochemical drivers of solubility predictions, offering interpretable insight into model behaviour. These results demonstrate that MI-guided Hamiltonian feature extraction can reproduce the performance of strong classical interaction models on aqueous solubility while generating a compact, interpretable feature representation that is compatible with future quantum execution.

12

Distillation enables scalable high-fidelity virtual screening across ultra-large chemical libraries

Dai, J.; Wang, Y.; Shan, N. L.; Mariani, M.; Yu, Z.; Yan, Q.; Golani, L. K.; Surovtseva, Y. V.; Lee, W. H.; Pusztai, L.

2026-07-03 bioinformatics 10.64898/2026.06.29.735361 medRxiv

Top 0.1%

6.6%

Show abstract

Accurate virtual screening of ultra-large chemical libraries remains challenging. Existing approaches rely on lower-fidelity scoring functions or sampling-based strategies that can limit predictive accuracy and bias the exploration of chemical space. Here, we present FastBindRank, a distillation-based framework that transfers the predictive power of the structure-based model Boltz-2 into an efficient sequence-based surrogate. Trained on ~1% of the 122-million-compound PubChem library, FastBindRank enables high-fidelity screening at scale. Applied to histone deacetylase 11 (HDAC11), FastBindRank substantially enriched high-confidence binders relative to the background chemical space. The lightweight model captured structural patterns associated with predicted binding, revealing structural determinants of binding. Under a comparable computational budget, FastBindRank achieved a 74-fold increase in hit rate and over a 30-fold increase in discovery yield over direct subset-based screening. Experimental validation confirmed the activity of two novel compounds. These results establish distillation as a practical strategy for scalable, high-fidelity virtual screening of ultra-large chemical libraries.

13

Agentic AI for Structural Elucidation and Discovery of Drug Metabolites from Mass Spectrometry Data

Wang, X.; Patan, A.; Zhao, H. N.; Charron-Lamoureux, V.; Shin, Y.; Petras, D.; Hong, Y.; Bowen, B. P.; Northen, T. R.; Dorrestein, P. C.; Wang, M.

2026-06-26 bioinformatics 10.64898/2026.06.23.734138 medRxiv

Top 0.1%

6.1%

Show abstract

The majority of chemical signals detected in public metabolomics repositories remain structurally undefined. Large language models (LLMs) are probabilistic systems whose capacity to generate outputs beyond their training data, which can cause hallucinations, makes them also potentially suited to hypothesize structures for molecules that have never been described. We aimed to build a system that could harness this LLM generative capacity combined with domain specific tools/framework to constrain hallucination and produce validated discoveries. We developed a GNPS2 agentic AI system that interprets LC-MS/MS data by integrating spectral alignment, molecular formula inference, rule-based structural enumeration, machine learning-based spectrum prediction, and translates natural language hypotheses from domain experts into dynamically generated analytical workflows. We demonstrate the annotation of unknown drug metabolites from public data guided by chemical hypotheses. The agent predicted, and we experimentally confirmed, a phosphorylated hydroxyzine, an acetaminophen-p-coumaric acid ester, and identified two new oxidative ibuprofen-carnitine conjugates from public repositories. These results demonstrate that LLM-driven agentic reasoning, when combined with domain expertise, can indeed generate experimentally testable structural hypotheses for previously uncharacterized metabolites leveraging pan repository data.

14

Pep2Mol: 3D Molecule Generation Targeting Protein-Protein Interfaces with Diffusion Models

Yue, R.; Yang, Z.; Seabra, G.; Li, C.; Li, Y.

2026-06-29 bioinformatics 10.64898/2026.06.28.734975 medRxiv

Top 0.1%

5.7%

Show abstract

Protein-protein interactions (PPIs) are central to biological processes. Designing small molecules that modulate dysregulated PPIs holds strong promise for targeting undruggable proteins. However, existing structure-based drug design approaches focus on well-defined small-molecule binding pockets and struggle to generalize to large, shallow, and chemically complex PPI interfaces. Here, we introduce Pep2Mol, a diffusion-based generative model for 3D molecule design that targets orthosteric PPI sites by explicitly incorporating binding peptides or proteins as structural guidance, moving beyond conventional pocket-conditioned generation. To enable model development and benchmarking, we curate a large-scale, high-quality dataset of 10,956 experimentally resolved protein complex structure pairs, each pairing an orthosteric competitive ligand with a protein binder at overlapping receptor interfaces. Pep2Mol integrates two SE(3)-equivariant graph neural networks that encode protein-ligand and protein-peptide interactions respectively, and fuses these representations via attention-based conditioning to jointly guide the diffusion trajectory. Extensive evaluations demonstrate that Pep2Mol generates chemically valid ligands with state-of-the-art binding affinities, providing a strong foundation for small-molecule inhibitor design against challenging PPI interfaces.

15

DESI-MS-Based Analysis of Drug Distribution in Human Renal Cystic Tissue Using the Chorioallantoic Membrane (CAM) as a 3D In Vivo Model

Dettmer, K.; Hehemann, A. M. E.; Schueler, J.; Heckscher, S.; Gross, V.; May, M.; Nuebel, B.; Wullich, B.; Buchholz, B.; Werner, J. M.; Jantsch, J.; Gronwald, W.; Takats, Z.; Oefner, P. J.; Schmidt, K. M.; Haerteis, S.

2026-07-01 biochemistry 10.64898/2026.07.01.735776 medRxiv

Top 0.1%

5.6%

Show abstract

The chorioallantoic membrane (CAM) model represents a promising three-dimensional in vivo platform for preclinical drug testing in human tissues. In this study, we investigated whether the tissue penetration and distribution of benzbromarone, a known inhibitor of the Ca2+ activated chloride channel TMEM16A and potential therapeutic agent for autosomal dominant polycystic kidney disease (ADPKD), can be successfully visualized in human renal cyst tissue cultured on the CAM. To this end, desorption electrospray ionization mass spectrometry imaging (DESI-MSI) combined with an ultrahigh-resolution time-of-flight mass spectrometer was employed. We achieved spatially resolved molecular mapping of endogenous metabolites and lipids as well as the applied compound. MSI enabled clear differentiation between CAM and cystic tissue based on their distinct lipid profiles. Benzbromarone was reproducibly detected in the cyst specimens and exhibited selective accumulation along the cyst epithelium, which is considered the principal site of action. These observations were complemented by multivariate analyses including Uniform Manifold Approximation and Projection (UMAP), and sparse multinomial logistic zero-sum classification. The data-driven approach confirmed molecular differences between tissue types and allowed accurate classification of drug-treated and untreated regions. This study demonstrates that topically applied benzbromarone penetrates human renal cyst tissue in the CAM model and localizes to pharmacologically relevant tissue regions, notably the location of the Ca2+ activated chloride channel TMEM16A in the epithelial lining. The integration of high-resolution DESI-MSI with advanced statistical analysis provides a robust and label-free method to study drug distribution in human tissue grafts. Our findings contribute to the advancement of translational research in analytical chemistry and pharmacology.

16

PEPstrMOD2: Next-generation tertiary structure prediction of chemically modified and non-natural peptides

Jain, S.; Mehta, N. K.; Raina, S.; Kumar, P.; Varun, ; Raghava, G. P. S.

2026-07-06 bioinformatics 10.64898/2026.06.22.733733 medRxiv

Top 0.1%

5.3%

Show abstract

While most existing methods are limited to predicting the tertiary structures of proteins containing only canonical residues, the PEPstrMOD server (developed in 2015) pioneered structure prediction for chemically modified and non-natural peptides. Despite its widespread use, the original framework was restricted to peptides of 7 to 25 residues and relied on older backbone-prediction algorithms. To address these limitations, we present PEPstrMOD2, which introduces three major advancements over its predecessor. First, it replaces the original in-house coordinate generation with state-of-the-art deep learning (DL) algorithms, leveraging AlphaFold2 and ESMFold for highly accurate initial structure prediction. Secondly, it greatly expands the accessible chemical space through incorporation of new, AMBER force-field compatible library of 257 post-translational modifications (PTMs), 428 non-canonical amino acids (NCAAs), and 243 terminal modifications. Lastly, through the application of native scalability of AlphaFold2 (AF2) and ESMFold (EF), PEPstrMOD2 eliminates the original restrictions of the length, enabling the structural modeling of longer, complex therapeutic peptides and small proteins. We evaluated the performance of PEPstrMOD2 against state-of-the-art methods across three distinct peptide datasets. For the AfCyc dataset consisting of 80 cyclic peptides, PEPstrMOD2 obtained a competitive average atom-level Root Mean Square Deviation (RMSD) of 2.05 angstroms, compared to 1.13 angstroms by AlphaFold3 (AF3) and 1.82 angstroms by AfCycDesign. Remarkably, for the modified peptide ModPep433 dataset, PEPstrMOD2 outperformed AF3, achieving the lower average RMSD score of 4.49 angstroms against 4.67 angstroms of AF3. Furthermore, in the case of the ModPep16 benchmark, PEPstrMOD2 achieved 2.50 angstroms average RMSD value, which is two times more accurate than that of the original PEPstrMOD (5.84 angstroms). In summary, PEPstrMOD2 provides a powerful, high-throughput, and highly accurate platform to facilitate peptide-based drug development and structural biology research. While the original PEPstrMOD was restricted to a web server interface, PEPstrMOD2 is available as both an intuitive webserver and a standalone command-line tool via GitHub, featuring Docker support for easy deployment and reproducible, large-scale modeling pipelines (https://webs.iiitd.edu.in/raghava/pepstrmod/).

17

BoltzMol-1: Towards Reliable Virtual Screening for Fast and Cost-Effective Hit Discovery

Getz, N.; Smith, G.; Colgan, A.; Fan, V.; Cavalleri, L.; Capponi, F.; Wohlwend, J.; Gitter, A.; Kritzer, J.; Maiorano, M.; Wlodarchak, N.; Corso, G.; Passaro, S.

2026-07-06 biochemistry 10.64898/2026.07.04.736485 medRxiv

Top 0.1%

4.8%

Show abstract

We present BoltzMol-1, a small-molecule hit discovery pipeline, centered on an optimized version of Boltz-2, explicitly adapted for prospective discovery. Reliable hit discovery that generalizes across target classes (rather than only the well-characterized families that dominate existing ligand data) would broaden the range of biology accessible to small-molecule intervention and reduce reliance on resource-intensive high-throughput screening. Towards this goal, the system prioritizes compounds for rapid experimental validation by coupling model-driven ranking with streamlined procurement from commercial catalogs. To improve developability at the point of selection, we introduce a suite of ADMET models for kinetic solubility (logS), lipophilicity (logD), and Caco-2 permeability. These models act as an early triage layer, systematically filtering out compounds with unfavorable physicochemical and absorption properties prior to synthesis or purchase. Across a panel of ten targets (most with no representation in the underlying affinity training data) we observe strong prospective performance on challenging systems. Functional actives or binders were identified for 6 of 10 targets, despite modest experimental budgets of 28-96 compounds per target. These results include successes on receptors and enzymes traditionally considered difficult for structure- or ligand-based approaches. Collectively, this work establishes a practical framework for low-throughput, cost constrained discovery campaigns capable of delivering chemically tractable binders with favorable property profiles.

18

ThermoFusion: A Multimodal Deep Learning Framework for Generalizable Prediction of Enzyme Thermostability

Wei, Y.; Eberini, I.; Meyer, F.

2026-07-07 bioinformatics 10.64898/2026.07.04.736494 medRxiv

Top 0.1%

4.7%

Show abstract

Protein thermostability is a critical property for both industrial and biomedical enzyme applications, yet experimental evaluation of mutation-induced stability changes remains laborious and costly. Here, we present ThermoFusion, a hybrid deep learning framework that integrates 3D protein structure embeddings from ThermoMPNN with sequence-based embeddings from the pretrained protein language model ESM2 to predict the effects of single-point mutations on protein stability ({Delta}{Delta}G). ThermoFusion exhibits robust generalization, maintaining high predictive accuracy across out of distribution sequences with low identity to the training set -- a scenario where many other machine learning models, including ThermoMPNN and state-of-the-art tools, perform poorly due to reliance on memorization. Benchmarking on a curated enzyme dataset comprising of 105 enzymes and 3144 mutations shows that ThermoFusion reliably identifies stabilizing mutations while accurately predicting stability for enzymes beyond its training set. These results establish ThermoFusion as a powerful tool for rational enzyme design beyond its training set.

19

Improving oral dissolution kinetics of weakly basic vodobatinib via slurry conversion to an amorphous drug-polymer salt

DeLion, L.; Dasaro, S.; Baghbanbashi, M.; Zemlyanov, D.; Ristroph, K.

2026-06-27 bioengineering 10.64898/2026.06.26.734800 medRxiv

Top 0.1%

4.3%

Show abstract

Vodobatinib (VBN) is a weakly basic (pKa {approx} 2.3), anticancer treatment with poor enteric solubility and low oral bioavailability. This study demonstrates how an emerging polymeric amorphization technique, slurry conversion, can yield amorphous drug-polymer salts with enhanced dissolution rates. The technique had not previously been applied to a weakly basic drug, so design rules for this class of active were unknown. Two acidic polymers, poly(styrene sulfonic acid) (PSSA) and poly(acrylic acid) (PAA), were individually evaluated for salt formation with VBN. Formulation involved blending the drug and polymer in a 1:2 (v/v) ratio of a protic liquid to solvent and a 1:9 (w/w) ratio of solid to solvent. Design rules for effective combinations of solvents and protic liquids were developed and optimized to thread the needle between dissolution of all species and acid-base interactions, both of which were required to form amorphous salts. Drug loadings of 10%, 20%, and 40% by mass were tested. X-ray photoelectron spectroscopy was employed to evaluate protonation of the quinoline nitrogen atoms on VBN, a key indicator of successful salt formation. Powder X-ray diffraction was used to confirm that the resulting slurry contained amorphous VBN, and 1H NMR spectroscopy indicated residual solvent remained after drying, which remains an area for improvement. In dissolution kinetics tests in FeSSIF, the lead drug-polymer salt formulation achieved a concentration of dissolved VBN up to 140 {micro}g/mL, an improvement of >35-fold compared to <4 {micro}g/mL (LLD) for crystalline VBN. These results demonstrate that slurry conversion is a viable polymeric amorphization technique even for weakly basic drugs. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=148 SRC="FIGDIR/small/734800v1_ufig1.gif" ALT="Figure 1"> View larger version (33K): org.highwire.dtl.DTLVardef@1812ceforg.highwire.dtl.DTLVardef@1ad06dcorg.highwire.dtl.DTLVardef@9d8bb7org.highwire.dtl.DTLVardef@13fcbe8_HPS_FORMAT_FIGEXP M_FIG C_FIG

20

Generative Drug Design in a Loop with dtSFM

Reddy, S. T.

2026-07-08 synthetic biology 10.64898/2026.06.10.731501 medRxiv

Top 0.1%

4.3%

Show abstract

Directed evolution consisting of iterative rounds of diversification, selection, and counter-selection, underlies modern protein and antibody engineering, yet small-molecule drug design still advances largely through high-throughput screening and medicinal-chemistry intuition. Transformer softmax attention is mathematically identical to the Boltzmann distribution that governs molecular binding at thermal equilibrium1, an isomorphism that prescribes a sequence-native Specificity Foundation Model (SFM)2. This framework was recently applied across seven molecular recognition domains3,4 and scaled into the drug-target SFM (dtSFM), the first to pair a full-scale encoder with a generative decoder5. Whether such a model can be driven, iteratively and under selection, to optimize leads rather than sample them once has not been shown. Here we present GenLoop, a closed generative drug design loop that turns single-pass generation into directed evolution of chemistry. dtSFM generates target-conditioned molecules and reranks them by their thermodynamic compatibility score. An orthogonal structural verifier, AlphaFold 3, is used that shares no architecture or training data with dtSFM. Cheminformatics filters enforce developability, and generative evolution is performed on the structurally verified candidates, selecting for predicted binders and counter-selecting against off-target chemistry. Applied across twelve drug targets spanning pharmacologically distinct mechanism classes, GenLoop produced AlphaFold 3-verified designs that reached the structural confidence of the approved drug for five of the twelve targets, with the best designs at interface iPTM 0.93-0.98 and PAE 0.8-2.0 [A], as well as resolving paralog selectivity across nine targets. Two full disease campaigns followed. For the cystic-fibrosis transmembrane conductance regulator, GenLoop designed nine developability-filtered and structurally novel lead candidates (iPTM up to 0.93, interface PAE 2.3 [A]) targeting all three orthogonal sites of the approved drug Trikafta. For the GLP-1 receptor family, dtSFM engineered tunable single-, dual-, and triple-receptor incretin designs, yielding 23 central-pocket candidates that are structurally novel at median iPTM 0.89 and interface PAE 1.95 [A]. GenLoop with dtSFM brings directed evolution to small molecules through computational-thermodynamic selection; wet-lab validation is the immediate next step.